Attentional Pooling for Action Recognition
نویسندگان
چکیده
We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while keeping the network size and computational cost nearly the same. It leads to significant improvements over state of the art base architecture on three standard action recognition benchmarks across still images and videos, and establishes new state of the art on MPII dataset with 12.5% relative improvement. We also perform an extensive analysis of our attention module both empirically and analytically. In terms of the latter, we introduce a novel derivation of bottom-up and top-down attention as low-rank approximations of bilinear pooling methods (typically used for fine-grained classification). From this perspective, our attention formulation suggests a novel characterization of action recognition as a fine-grained recognition problem.
منابع مشابه
The Effect of Attentional Focus on Gaze Behavior and Accuracy of Dart Throwing: The Attentional Task Demands Problem
Focus of Attention and Quiet Eye (QE) of the affecting variables on aiming task performance in recent decades have always been interesting for psychologist and sport science researchers. The purpose of this study was to investigate the effectiveness of attention instructions on gaze behavior and accuracy of dart throwing of novice in low and high task load. In a semi-experimental design with re...
متن کاملOrder-aware Convolutional Pooling for Video Based Action Recognition
Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame. The pooling methods that they adopt, however, usually completely or partially neglect the dynamic information contained in the temporal domain, which may undermine the discriminative power of the resulting video representation since the video sequence ...
متن کاملSecond-order Temporal Pooling for Action Recognition
Most successful deep learning models for action recognition generate predictions for short video clips, which are later aggregated into a longer time-frame action descriptor by computing a statistic over these predictions. Zeroth (max) or first order (average) statistic are commonly used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel e...
متن کاملEigen Evolution Pooling for Human Action Recognition
We introduce Eigen Evolution Pooling, an efficient method to aggregate a sequence of feature vectors. Eigen evolution pooling is designed to produce compact feature representations for a sequence of feature vectors, while maximally preserving as much information about the sequence as possible, especially the temporal evolution of the features over time. Eigen evolution pooling is a general pool...
متن کاملتاثیر پردازش کلی چهره ای بر سو گیری توجه نسبت به چهره های هیجانی در کودکان مضطرب
This study was performed to examine the effect of holistic face processing and trait anxiety on children’s attentional biases toward schematic natural and jumbled emotional faces (angry, happy, neutral). The participants were entered into study considering their scores in Trait anxiety inventory for children (Spielberger, 1973) and the results of a semi-structured interview. 30 high-and 30 low ...
متن کامل